submodular cover
An Efficient Streaming Algorithm for the Submodular Cover Problem
We initiate the study of the classical Submodular Cover (SC) problem in the data streaming model which we refer to as the Streaming Submodular Cover (SSC). We show that any single pass streaming algorithm using sublinear memory in the size of the stream will fail to provide any non-trivial approximation guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only seek to find a partial cover. We design the first Efficient bicriteria Submodular Cover Streaming (ESC-Streaming) algorithm for this problem, and provide theoretical guarantees for its performance supported by numerical evidence. Our algorithm finds solutions that are competitive with the near-optimal offline greedy algorithm despite requiring only a single pass over the data stream. In our numerical experiments, we evaluate the performance of ESC-Streaming on active set selection and large-scale graph cover problems.
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Virginia (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Israel (0.04)
Fast Distributed Submodular Cover: Public-Private Data Summarization
In this paper, we introduce the public-private framework of data summarization motivated by privacy concerns in personalized recommender systems and online social services. Such systems have usually access to massive data generated by a large pool of users. A major fraction of the data is public and is visible to (and can be used for) all users. However, each user can also contribute some private data that should not be shared with other users to ensure her privacy. The goal is to provide a succinct summary of massive dataset, ideally as small as possible, from which customized summaries can be built for each user, i.e. it can contain elements from the public data (for diversity) and users' private data (for personalization). To formalize the above challenge, we assume that the scoring function according to which a user evaluates the utility of her summary satisfies submodularity, a widely used notion in data summarization applications.
- Europe > United Kingdom (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > Virginia (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Israel (0.04)
Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
We investigate two new optimization problems -- minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 23] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and, an approximation algorithm solving one can be used to obtain an approximation guarantee for the other.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
Submitted by Assigned_Reviewer_1 Q1 The paper proposes a modified version of the epsilon greedy algorithm for finding approximate solution to the generalized problem of submodular cover on the integer lattice. It is shown that the proposed algorithm provides a bicriteria approximation guarantee while having a running time which is polynomial in the input size. The paper is among a class of resent papers that have been working on accelerating and hence scaling up the optimization methods to be practical for large modern data sets (as is commonly seen in machine learning tasks). Q2 Overall, I think the paper is well-written and through and does a good job of providing theoretical guarantees for the proposed algorithms for submodular maximization over an integer lattice. The proposed problem has real world applications and the quality of the solution is shown through experiments on real and artificial datasets. Submitted by Assigned_Reviewer_2 Q1 PAPER SUMMARY The paper proposes a generalization of submodular cover to integer lattices.
Distributed Submodular Cover: Succinctly Summarizing Massive Data
How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. Here, the utility is assumed to exhibit submodularity, a natural diminishing returns condition preva- lent in many data summarization applications. The classical greedy algorithm is known to provide solutions with logarithmic approximation guarantees compared to the optimum solution. However, this sequential, centralized approach is imprac- tical for truly large-scale problems. In this work, we develop the first distributed algorithm – DISCOVER – for submodular set cover that is easily implementable using MapReduce-style computations.
A Generalization of Submodular Cover via the Diminishing Return Property on the Integer Lattice
We consider a generalization of the submodular cover problem based on the concept of diminishing return property on the integer lattice. We are motivated by real scenarios in machine learning that cannot be captured by (traditional) submodular set functions. We show that the generalized submodular cover problem can be applied to various problems and devise a bicriteria approximation algorithm. Our algorithm is guaranteed to output a log-factor approximate solution that satisfies the constraints with the desired accuracy. The running time of our algorithm is roughly O(n\log (nr) \log{r}), where n is the size of the ground set and r is the maximum value of a coordinate.
Fair Submodular Cover
Chen, Wenjing, Xing, Shuo, Zhou, Samson, Crawford, Victoria G.
Submodular optimization is a fundamental problem with many applications in machine learning, often involving decision-making over datasets with sensitive attributes such as gender or age. In such settings, it is often desirable to produce a diverse solution set that is fairly distributed with respect to these attributes. Motivated by this, we initiate the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2^U\to\mathbb{R}_{\ge 0}$, a threshold $\tau$, the goal is to find a balanced subset of $S$ with minimum cardinality such that $f(S)\ge\tau$. We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(\frac{1}{\epsilon}, 1-O(\epsilon))$. We then present a continuous algorithm that achieves a $(\ln\frac{1}{\epsilon}, 1-O(\epsilon))$-bicriteria approximation ratio, which matches the best approximation guarantee of submodular cover without a fairness constraint. Finally, we complement our theoretical results with a number of empirical evaluations that demonstrate the effectiveness of our algorithms on instances of maximum coverage.
- North America > United States > Texas (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)